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Abstract — We consider the problem of estimating sparse com- 
munication ctiannels in the MIMO context. In small to medium 
bandwidth communications, as in the current standards for 
OFDM and CDMA communication systems (with bandwidth up 
to 20 MHz), such channels are individually sparse and at the 
same time share a common support set. Since the underlying 
physical channels are inherently continuous-time, we propose a 
parametric sparse estimation technique based on finite rate of 
innovation (FRI) principles. Parametric estimation is especially 
relevant to MIMO communications as it allows for a robust 
estimation and concise description of the channels. 

The core of the algorithm is a generalization of conven- 
tional spectral estimation methods to multiple input signals with 
common support. We show the application of our technique 
for channel estimation in OFDM (uniformly/contiguous DFT 
pilots) and CDMA downlink (Walsh-Hadamard coded schemes). 
In the presence of additive white Gaussian noise, theoretical 
lower bounds on the estimation of SCS channel parameters in 
Rayleigh fading conditions are derived. Finally, an analytical 
spatial channel model is derived, and simulations on this model in 
the OFDM setting show the symbol error rate (SER) is reduced 
by a factor 2 (0 dB of SNR) to 5 (high SNR) compared to standard 
non-parametric methods — e.g. lowpass interpolation. 

Index Terms — Channel estimation, MIMO, OFDM, CDMA, 
Finite Rate of Innovation. 



I. Introduction 

Multiple input multiple output (MIMO) antenna wireless 
systems enable significant gains in both throughput and relia- 
bility fll-f?! and are now incorporated in several commercial 
wireless standards 15], El- However, critical to realizing the 
full potential of MIMO systems is the need for accurate 
channel estimates at the receiver, and, for certain schemes, 
at the transmitter as well. As the number of transmit antennas 
is increased, the receiver must estimate proportionally more 
channels, which in turn increases the pilot overhead and tends 
to reduce the overall MIMO throughput gains Q. 

To reduce this channel estimation overhead, the key in- 
sight of this paper is that most MIMO channels have an 
approximately sparse common support (SCS). That is, the 
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channel in each transmit-receive (TX-RX) antenna pair can 
be modeled as a discrete multipath channel, with the relative 
time delays being common across different TX-RX pairs. The 
commonality across the different antenna pairs reduces the 
overall number of degree of freedom to estimate, which can 
in turn be used to reduce the pilot overhead or improve the 
channel estimate. Also, in communication systems that depend 
on channel state feedback from the RX to the TX, the SCS 
model may enable a more compressed representation. 

To exploit the SCS property of MIMO channels, we propose 
a variant along El, lH of the finite rate of innovation (FRI) 
framework, originally developed in fW\. The method, which 
we call SCS-FRI, uses classical spectral estimation techniques 
such as Prony's method, ESPRIT and Cadzow denoising to 
recover the delay positions in frequency domain. The method 
is computationally simple, and our simulations demonstrate 
excellent performance in practical scenarios. The prposed 
SCS-FRI algorithm applies immediately to channel estimation 
in multi-output OFDM communication with contiguous or 
uniformly scattered DFT pilots. Interestingly enough it can 
be used on other modulation schemes provided a suitable 
pilot layout. The Walsh-Hadamard transform (WHT), used 
in CDMA downlink channel among others, qualifies if one 
controls the pilots layout in the WHT domain. 

We also derive a simple scalar formula for the Cramer-Rao 
bound on the estimation of separable ToAs, and also point to a 
more general result by Yau and Bresler |fTT| . Both bounds are 
extended to Rayleigh fading SCS channels to lower bound the 
expected estimation error in fading conditions. Our simulations 
indicate the proposed SCS-FRI method is close to this bound 
at high SNRs. 

A. SCS MIMO models 

Due to the physical properties of outdoor electromagnetic 
propagation, wireless channels are often modeled as having a 
channel impulse response (CIR) that is sparse in the sense 
that they contain few significant paths [12|. With multiple 
antennas, the CIR measured at different antennas share a 
common support, i.e. the times of arrival (ToA) at different 
antennas are similar while the paths amplitudes and phases 
are distinct. This sparse common support (SCS) channels is 
illustrate it in Figure [T] The SCS channel model is usually 
assumed in the literature, though its physical motivations are 
not always put forth. 

It is important to note that the sparsity and common support 
assumptions only hold with respect to the channel bandwidth 
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Fig. 1. (a) Transmission over a bandlimited medium witli two scatterers and P receiving antennas, (b) Tlie P channels contain two patlis arriving at tlie 
same time up to ite, and are thus no exact SCS channels for e > 0. 



TABLE I 

Channel bandwidth in popular wireless systems 



System 



Code Bandwidth B 



Resolvable 
distance c/B 



DVB-T 113 


DFT 


5-8 MHz 


38-60 m 


is-95 m 


WHT 


1.25 MHz 


240 m 


3GPP LTE ll6l 


DFT 


1.4-20 MHz 


15-215 m 


UWB 


— 


> 500 MHz 


< 60 cm 



B and the SNR of the channel. Indeed, in the presence of 
noise, resolution is limited by the inverse bandwidth 1/B, even 
if one knows exactly which parametric model the signal obeys. 
In practice, 1/10* of the inverse bandwidth is a reasonable 
resolution to shoot for. The limited resolution has the effect 
of clustering paths from a single scatterer into a single path 
(promoting sparsity), and the small shift in the ToA due to 
the distance between antennas becomes negligible (promoting 
common support). Table |T] gives the channel bandwidth of 
several modern standards and c/B which is the distance 
travelled by an electromagnetic wave in a time lapse equal 
to the inverse bandwidth. 

B. Related work 

In OFDM systems, the majority of commercial channel 
estimators often simply perform some form of linear filtering 
or interpolation of the pilot symbols ifTSJI . lfT6l . Such non- 
parametric techniques are computationally very simple, but 
fundamentally cannot exploit the common sparsity in MIMO 
channel models. Since the phases and magnitudes are generally 
independent on the paths on different antenna pairs, the 
frequency response of sparse common support (SCS) channels 
are not correlated in any simple manner that can be exploited 
by basic linear interpolation of pilots. 

A different line of work has proposed compressed sensing 
based methods for sparse channel estimation lfT7]| - ll20l . In the 
compressed sensing context, the SCS property is equivalent 
to joint or group sparsity for which there are several methods 
including group LASSO lEB, ||22, group OMP ED and behef 
propagation ll24i . Techniques for mixes of joint and individual 



TABLE II 

Channel estimation methods are naturally classified in terms 

OF THE channel PROPERTIES THEY EXPLOIT. 



Algorithm Exploited channel properties 
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sparsity are considered in 112511 . Il26ll . All of these compressed 
sensing methods, however, require that the delay locations 
are discretized and exact sparsity is achieved only when the 
true path locations fall exactly on one of the discrete points. 
With continuous value path locations, each path components 
will require a number of terms to approximate well, or 
demand a larger number of dictionary elements to offer a finer 
discretization. 

Another joint estimation problem with FRI signals is studied 
in EH. 

C. Contributions 

The contributions of this work are four- folds: 

• Extension of classical FRI sampling and estimation to 
multiple SCS channels (Section HH) 

« Derivation of simple scalar formulas for the CRB of SCS 
channels (Section Ullli 

• Application to OFDM and Walsh-Hadamard coded (e.g. 
CDMA downlink) communications with contiguous or 
uniformely scattered DFT pilots (Section |IV| ) 

• Characterization of a precise spatial analytical model for 
SCS channels (Section [V]i 

The proposed SCS-FRI algorithm stands out compared to 
FRI or lowpass interpolation as it exploits more channel prop- 
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erties, as indicated in Table |II] Lowpass based techniques are 
a sensible non-parametric approach as they exploit the short 
delay-spread property. In general, any estimation technique 
based on uniformly scattered DFT pilots uses this property, 
as it is a necessary condition to the unicity of the solution. 

We conclude our study with numerical simulations showing 
the efficiency of the SCS-FRI algorithm in a Rayleigh fading 
scenario, and compare its equalization gain to a standard non- 
parametric approach, i.e. lowpass interpolation in the DFT 
domain. 



II. Sparse Common Support FRI: Theory and 
Algorithms 

A. Problem formulation 

We consider the physical setup described in Figure [T|(a). 
A periodic signal of limited bandwidth is transmitted over a 
multipath channel and uniformly sampled by a receiver with P 
antennas. This leads to P parallel multipath channels as shown 
in Figure [r|(b). The channels either share a common support 
exactly, in which case they are called exact SCS channels, or 
approximately, in which case they are called SCS channels 
(e.g. Figure [11(b)). 

Consider P exact SCS channels shaped by a kernel ip, with 
complex baseband equivalent model: 



K 
k=l 



ck,p e C, tk e [0 r[ 



(1) 



where ip{t) is the r-periodic sine function or Dirichlet kernel: 

sin(7ri?t) 



ip{t) 



kei, 



smc{B{t - kr)) 



BTsin(2*) 



(2) 



The kernel Lp is considered periodic as the filtering of a 
periodically padded signal by a linear shift invariant filter. 
Therefore, linear convolution of the CIR with the shaping 
kernel becomes circular 

We assume that the bandwidth parameter B satisfies B = 
(2A/ + l)/r for M > K. The paths coefficients Ck.p are 
treated as complex random variables. N measurements yp[n] 
are acquired at a rate l/T — N/t (with r the signal period 
and N > Bt = 2M + 1) and corrupted by AWGN 

yp[n] = hp[n] + qp[n] n £ {0, . . . , N - 1}, (3) 

where q ^ A/c(0,o'^I) if the channel is complex-valued 
or q ^ A/^(0, ct^I) if real-valued. In the DFT domain, the 
received signal is: 



yp[m\ = ip[m\ 



K 

E 

fe=i 



Cfe,pM^'"*'=+%,[ 



(4) 



where W = e-^'fj/^ and p[m] = 1/(2M + 1) for \m\ < M 
and is zero otherwise. The goal is to estimate the support 
{tk}k=i...K and the paths amplitudes {ck,p}k=i...K,p=i...p 
from the NP samples collected in (O. Once the support is 
known, estimation of the path amplitudes is simple linear 
algebra as seen in (|4|i. 



B. Support recovery from baseband DFT coefficients 

We start from (|4|i. The DFT samples yp[m] in the baseband 
(|to| < M) are the DFT coefficients of the channel corrupted 
by some Gaussian noise. 

The noiseless DFT coefficients of a X-multipath channel 
have the well-known and interesting property to form a linear 
recurrent sequence of order K + 1, i.e., any coefficient hplw] 
(m > —M + K) can be expressed as a unique linear 
combination of the K previous DFT coefficients common to 
all indices m: 



K 



-M- 



Leituna 1. Given hp[m] = X]fc=i Ck^pW"^*'' for m 

K, . . . , M and ti ^ tj , Vi ^ j, there exists a unique set of 
coefficients {fk}k=i K such that: 



^pM = fihp[m - 1] + f2hp[m - 2] 



fKhp[m~K] 



where x^ — fix 



K-l 



Jk-iX — fx is the polynomial 



with roots {W*''}k=i,...,K- 

Proof A linear recursion of degree K can be written as: 

Xn = flXn-^1 H h fKXn-K, Ik ¥" 0- (5) 

Its characteristic equation is: 

^K - /ix^^i Jk-ix - fx - 0. (6) 

If Xx is a solution of ^ then multiplying both sides of the 
equation by A"~^ (7^ since fx 7^ 0) shows that A" is a 
solution of Q. Moreover by linearity, any linear combination 
of solutions of ^ is still a solution, and if ^ has K 
distinct solutions, {fk}k=i....,K is uniquely defined by a set of 
K independent linear equations. Hence, for J2k=i '^k,pW"^^'' 
"solution" of (|5]l, tk ^ ti mod t for all k ^ I, there exists a 
unique set {fk}k=i,....K such that {W'^''}k=i,....K are the K 
distinct roots of 



„K 



fix''-' fK-lX-f 
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The coefficients yp[m] maybe arranged in a tall block- 
Toeplitz matrix 



H^L) 
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, (7) 



where yij — yi[j]. The data matrix H^ ' is made of P 
Toeplitz blocks of size {2M + 2 — L) x L, and we assume 
that both block dimensions are larger or equal to K. It 
possess interesting algebraic properties which form the core 
of line spectra estimation techniques. We will use Lemma[T]to 
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show three well-known spectral estimation tools which extend 
straightforwardly from Topelitz data matrices to block-Toeplitz 
ones, i.e. extend from single output to multiple outputs with 
SCS. We do so, with two propositions: 

Proposition 1. [Annihilating filter property] 

In the absence of noise (yp[m] — hp[rn\}, a set of exact SCS 
channels with K distinct paths verifies 



Algorithm 1 Block-Prony_TLS 



HiK+i)f ^ 0, 



(8) 



where / = [1 — /i ■ ■ • — JkY^ ore the annihilating filter 
coefficients such that the polynomial Pf{z) = 1 — X]fc=i fkz'^ 
has K roots {e~'^''^^''/''}k=i...K- The matrix if (^^+i^ 
with blocks as in ^ (with L ~ K + 1). 



Proof: This is a direct consequence of Lemma \T\ ■ 

Proposition 2. For a set of exact SCS channels with K distinct 
paths and in the absence of noise, H^ ' satisfies 

rank H^^'> = K, 



for K <L<2M + 2- K. 

Proof Let H be the top-left K x K minor of H^^^ 
can be written as the sum of K rank-1 matrices: 

K 



It 



H = J2c,w^'^^+'-^^HU,, 



fe=l 



such that Ifc = [1 W*" W"^^" ■ ■ ■ 1^(^-1)*'=]. If tk ^ 
ti mod r for all k ^ I, then {^i, . . . , ^^} form a set of non 
colinear vectors. Therefore rank H^ ' > K. 

Choose K mutually independent rows of H^ '. From 
Lemma [T] truncating these row vectors to length K pre- 
serves the linear independence. Therefore, given a row h — 
[h[0] , . . . , h[L — 1]] of H^ ', there exists a linear combination 
of these K rows h' such that 

h[k] = h'[k] , k^O,...,K-l. 

By construction, since all rows of if' ' verify the the same 
linear recursion of degree K, h and h' verifies this recursion. 
Hence h — h', implying rank H^ ' is at most K. ■ 

a) Block-Prony algorithm: Proposition [T] is Prony's 
method 128)1 . ||29ll for block-Toeplitz matrices. We call the 
corresponding algorithm "Block-Prony TLS", listed under 
Algorithm [T] It solves the annihilating filter equation ^ 
in the total least-square (TLS) sense. The crucial step is 
the identification of what shall be the unidimensional null 
space of H^ ^ Mn a noiseless case. Solving this problem 
in the TLS sense yields the least right singular vector of 
H^ ^ '. Prony's method is notoriously sensitive to noise, 
which is to be expected as the result relies on identification of 
the unidimensional complement of the ilT-dimensional signal 
space. This sensitivity can be mitigated with prior denoising 
of the measurements. 

b) Block-ESPRIT algorithm: Proposition |2] implies each 
block in the data matrix shares the same signal subspace. 
Hence the ESPRIT TLS algorithm outlined in [301 applies 
as-is to the block-Toeplitz data matrix H^ '. The Block- 
ESPRIT TLS algorithm is outlined in Algorithm|2] The Block- 



Require: An estimate on the number of effective paths i^'^^', 
2Af + 1 (M > K) channel DFT coefficients yp[m] = 

Ef=i Cfc.pW^r^ + %[t^\ for \M <M,p^l...P. 
1: Build if(^"'+i) according to ^. 
2: Compute the SVD decomposition of the data matrix: 



jjiK- 



usv* 



3: c/) <— roots(t;), such that v is the right singular vector 

associated to the least singular value. 
4: return {ifc'}fe=i...K« i ^arg(?i. 



is built Algorithm 2 Block-ESPRIT_TLS 



Require: An estimate on the number of effective paths K'^^^, 
2M + 1 (M > K) channel DFT coefficients yp[m] = 
Ef=i Cfe.pT^r*' + 5pH for |m| < M, p = 1 . . . P. 

1: Build H^^''' according to ^. 

2: Compute the SVD decomposition of the data matrix: 

Extract the signal subspace basis Hq ~ V i:(m-i),1:K"'^- 
Extract the rotated signal space basis Hi = V2:M,1:K'^^- 
Solve Hi — Ho'4' in the TLS sense. 
return {i|"}fc=i...Kc„ ^ {-^arg Afc (*)}fc=i...if=«. 



Algorithm 3 Block-Cadzow denoising 



Require: A block-Toeplitz matrix ii' ' and a target rank K. 
Ensure: A block-Toeplitz matrix H^ ' with rank < K. 

1: repeat 

2: Reduce H^^'' to rank i^ by a ti'uncated SVD. 

3: Make H^^ ' p = I . . . P, Toeplitz by averaging diago- 
nals. 

4: until convergence 



Algorithm 4 SCS-FRI channel estimation 
Require: An estimate on the number of effective paths K^'^^, 
2M + 1 {M > K) noisy channel DFT coefficients 



Et 



Ck,pW, 



pyy N 



Qpi 



for 



< Af, p 



4: 



y'p[m\ 

1 ... p. 

Ensure: Support estimate {i|^'}fc=i...if'" 
1: Build H'-'^'^ according to ^. 

fj{M} ^ Block-Cadzow(ii'*''\ ivT"') [optional]. 

Update yp[m] with the first row and column of the 

denoised block if (,*^). 

Estimate the common support with Block-Prony _TLS or 

Block-ESPRIT_TLS. 

Estimate {cfc p} solving P linear Vandermonde systems 

©. 



ESPRIT algorithm fulfils the same goal as the Block-Prony 
algorithm, but its essence is entirely different. Where Prony's 
method identifies a line with least energy in a i^ dimensional 
space, ESPRIT finds the rotation between two /-iT-dimensional 
subspaces in an ^/-dimensional space. What makes ESPRIT 
much more resilient to noise is that the two subspaces are 
computed from the most energetic part of the signal. 
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c) Block-Cadzow denoising: Proposition |2] used together 
with the block-Toeplitz structure property yields the "lift-and- 
project" denoising Algorithm [3] which we call Block-Cadzow 
denoising OTI . Using the same argument as in lf32ll . the block- 
Cadzow algorithm provably converges. 

d) SCS-FRI: We have all the elements to describe the 
SCS-FRI algorithm. The Block-Cadzow algorithm may be 
used to denoise the measurements and is followed by either 
Block-Prony or Block-ESPRIT estimation of the common 
ToAs (solved in the TLS sense). 

For Cadzow denoising and ESPRIT, it is empirically found 
that a data matrix with square blocks works well. The last 
step is to estimate the path amplitudes independently for each 
channel. This is done by solving a linear Vandermonde system 
© |33|. The processing chain at the receiver is listed in 
Algorithm |4] and shown in Figure |2] Combination of Cadzow 
and ESPRIT for estimation of a single OFDM channel is 
considered in 1341 . We assumed the number of paths to be 
known. For estimation techniques of the number of paths, we 
refer to Ea. 



III. Estimation theoretic bounds on SCS-FRI 

RECOVERY 

A. Deterministic multipath channels 

In [To], the authors derive the Cramer-Rao lower 
bound |35l , |f36l for estimating the positions and weights 
of the Diracs in FRI signals. Considering a single Dirac 
with deterministic amplitude in a single-channel real-valued 
scenario, the minimal relative uncertainties on the location of 
the Dirac, ti, and on its amplitude, ci, are given by 
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"/Atiy 




V ^ / 


E 


"/Aciy 




_V Cl / _ 
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3(2Af + 1) 
4:7r^NM{M + r 



PSNR" 
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where PSNR = c^/cr^ is the input peak signal to noise ratio. 
When there are more than two Diracs, the Cramer-Rao formula 
for one Dirac still holds approximately when the locations are 
sufficiently far aparJH 

B. Jointly Gaussian multipath channels 

We derive bounds on the support estimation accuracy with 
measurements taken according to (O. The paths coefficients 
Ck,p are assumed to be jointly Gaussian, and modeled as the 
product of ak,p = E[|cfe^p|] by a standard normal random 
variable Z^.p having the following properties, consistent with 
the well-known Rayleigh-fading model: 

. Zk^p^Afc{0,^/T/2l). 

• Similar expected path amplitude between antennas: 

def _ _ _ 

Ofc — Ofc,! — 0,k,2 — • • • — flfc.P- 

• Independence between paths: E 

k'. 



^k,pZl,p, 



0, k^ 



• The random vector Z^ = [Zk.i ■ ■ ■ Z^p]'^ is defined as 
Zk = ifcT, where L^. is the Cholesky factor of the 
covariance matrix Rk = E[Zf;Zl] and r is a vector 
of iid standard complex Gaussian random variables. 

The Rayleigh-fading case can be seen as deterministic if 
conditioned on the path amplitudes. Thus, the Cramer-Rao 
bounds for random paths coefficients are random variables for 
which we can compute statistics. Expectation and standard 
deviation will respectively give the expected accuracy of the 
estimator and its volatility. For a single path, and a symmetric 
or antisymmetric (p (not necessarily a sine kernel), the Cramer- 
Rao bound has a concise closed form formula: 

Proposition 3. With complex-valued measurements according 
to (O, K — 1, and Z\ be a random Gaussian vector , then 



E 



.Ail 2 



E 



> 



\ziz,)-' 



2N ■ dSNR 



(9) 



where dSNR = \ai\'^\\ip'{nT-ti)\\'^/{Na'^) is the differential 
SNR and Ai, • • • , Xp are the eigenvalues of the covariance 
matrix Ri and P > I: 

• Uncorrelated paths coefficients, Xi = ■ ■ ■ = Xp = 1: 



E 



iziz,y 



(p-i) 



p> 1. 



Correlated path coefficients, such that Xi ^ ■ ■ ■ 

E[(z*zo-^l=f:(-A,r^i^n(v 



^Ap." 

p=l p p'^p 

Proof: See ll37l . The uncorrelated case is found in various 
statistical handbooks as the first moment of an inverse-x^ 
distributed random variable. For the correlated case, see fMI- ■ 

This expression is a suitable approximation for multipaths 
scenario with distant paths (separated by more than twice 
the inverse bandwidth. It gives an important insight on the 
evolution of the estimation performance when uncorrelated 
antennas are added to the system. Namely, the RMSE decays 
as 1/VP^T. 

In general, multiple paths are interacting with each other and 
the information matrix cannot be considered diagonal. I n this 
case Yau and Bresler lllli derived the following expression: 

Proposition 4. i lTTV Let $ and ^' be N x K matrices such 
that 

<i>„,fc = ^((n - 1)T - tk) , K.k = ^'{{n - l)T - h), 



ne {1,...,N}, ke {1, 
C 



, K}. Given the stochastic matrix 



'Empmcally, the distance should be larger than 2/B. 



diag{ai, ..., uk) ^ Z'^Z'* j diag{al, 

with Z' = \Z\_p ■ ■ ■ Zk,p^ , the Fisher information matrix J 
conditionned on the path amplitudes is given by 

J = 2(J-^^'*Pker^^' C. (10) 

such that Pker^ =1— ^^^ is the projection into the nullspace 
of $ and "0 " denotes the entrywise matrix product. 
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Fig. 2. The SCS-FRI sampling and reconstruction scheme in a multi-antenna channel estimation setting with P receiving antennas. 



See ifm for the proof. The Cramer-Rao bounds for the 
estimation of the normahzed times of arrival are on the 
diagonal of the expectation of J^^. The matrix J is a complex 
Wishart matrix. Computing its inverse moments analytically is 
not an easy task, nevertheless it can be numerically computed 
via Monte-Carlo simulations. 



IV. Application to OFDM and CDMA downlink 

1) SCS-FRI with uniformly scattered DFT pilots (OFDM): 
The theory in SectionHJis developed for contiguous DFT coef- 
ficients. In OFDM communications, pilots are often uniformly 
laid out in frequency (ETSI DVB-T |T3l, 3GPP LTE |61,. . .). 
The period of pilot insertion D is upper-bounded by A^^, the 
inverse of the delay-spread of the CIR : D < r/A. If not, 
the CIR cannot be unambiguously recovered from the pilots 
because of aliasing. For a fixed number of pilots, D is chosen 
as large as possible (D — [t/AJ), as interpolation of the CIR 
spectrum is more robust than extrapolation. 

SCS-FRI can take advantage of uniformly scattered pilot 
layouts im, im. For ip flat in {~MD, ..., MD}, equation 
dU becomes: 



K 



yp[ 



^d]-j: 



Ck,pW}^^ 



iDti^ 



qp[mD], 



(11) 



k=l 



which corresponds to a dilation by D of the support parameters 
{tk}- By definition < ife < A, and so the bound on D 
prohibits aliasing of Dtk- Therefore, SCS-FRI is applicable 
without other modification than division of the recovered 
support parameters by D. The results of Proposition [3] can 
be extended to scattered pilot with minimal effort. 

Corollary 1. The minimal uncertainties on the estimation of 
the parameters in the SCS-FRI scenario ( liil ) with P signals 



are given by 





f/Afi^ 


2' 
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K 








V ^ y 
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'/Ace'' 


2' 








V c^ / 


/ 



> 



3BT 



4D27r2M(Af + 1) 
> BTE[PSNR7^] 



[ESNR"^' 

^ = 1,..., 



P. 



For real-valued signal and noise ESNR = -ij X]^=i '^f denotes 
the effective signal to noise ratio and PSNR^ = c^/c^ .. For 

complex-valued signal and noise ESNR = ^^ S^=i c|q (^nd 
PSNR^ = c|Q/(2cr2).. 



Proof: 

E 



Ml 

T 



= E 



Apti) 



■D- 



Evaluation of E 



( t{Dt^\ 



based on measurements from 



(fTTT i is answered by Proposition [3] The dSNR is explicitly 
computed for (fTTT i taking </? = sinc^. ■ 

2) Extension to Walsh-Hadamard coded schemes (CDMA): 
Numerous applications use the 2"-WHT to code the channel 
into 2" subchannels (N — 2"). Among others, IS-95 uses a 
64-WHT to code the downlink channel. The straightforward 
way to insert pilots is to use one of these subchannels as a pilot 
itself and use correlation based channel estimation methods as 
the Rake-receiver for example BOl . The SCS-FRI algorithm 
works in the DFT domain but can nevertheless be applied in 
the WHT domain with pilots uniformly scattered by I? a power 
of 2. 

Proposition 5. Let Wn and Sn be respectively the 2"-points 
DFT and WHT matrices obtained by Sylvester's construction: 



-^T. 



1 1 
1 -1 



Si+i — Si 
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Then, for any £ € {l,...,n— 1} the set of Sn's columns 
with indices m |2^ + i} ^ and the set ofWn's columns 
with indices in {(i — 1/2) • 2"~^ + 1} ^ span the same 
subspace. 

Proof: We partition the Walsh-Hadamard transform ma- 
trix in two "left" and "right" blocks: 



Sn^ 



<f(0 qir)] q(.l) - 



Sn-1 



1 '-'n — 






Given u;^„ = [W^^ ■ ■ ■ Wkt ' ] the k*^ vector of the DFT 
basis and s^''' G span S)^': 



N-l 



1=0 
1=0 



Hence 



Wn 



„, s^^A^O, forfc 



even. 



The spans of S^^' and S'Jj'^' partition the original 2"- 
dimensional space in two subspaces of dimension 2"^^. Let 
Wl°' — {w2r,}o<2k+i<N,keN be the DFT basis vectors with 
odd indices and W^f — {w2,^}o<2k<N.keri the ones with 
even indices. The spans of W)"' and T^„ •* partition the 
original space into two subspaces of dimension 2"^^. Since 
span S'f;' ± span W^f^: 

span Vrl°^ =span S^;\ 
span Vr,[,'=' =span S^^l 

This property applies recursively, since for k e {0, . . . , 2"^^ — 
1}: 






w\= ^ 



s.^ ' ) = 



/ 1 

\\/2 




1 

' V2 


Sn-l 


lt>2"- 


1, Sn — 1 


), 





= ( 

where s„_i G span Sn-i- ■ 

Proposition |5] states that one can choose 2^ contiguous 
Walsh-Hadamard codewords for pilots and get 2^ uniformly 
spread DFT pilots with layout gap D = 2"^^. The channel 
coding is akin CDMA, but the pilot layout matches the 
one used in OFDM communication. The lesson, is that the 
Walsh-Hadamard transform alone achieves "scrambling" of 
data followed by carrier mapping in the DFT domain in a 
fashion similar to SC-FDMA (4T\. In SC-FDMA, the data are 
first "scrambled" by application of a shorter length DFT. 

This result has a nice interpretation in the context of 
generalized Fourier transforms, the 2"-WHT being itself the 
Fourier transform on the finite group (Z/2Z)" instead of 
Z/2"Z for the classical 2"-points DFT ||32l, ^. A similar 
result holds for DFT on any toric finite group ll37l . 



V. Application: Fading channel estimation in 

MULTI-OUTPUT SYSTEMS 

A. Channel model 

1) Physical assumptions: A linear time-invariant channel 
is characterized by its impulse response h. In mobile com- 
munications, channels are transient, but we may assume the 
channel to be locally invariant around a time r. This leads to 
the definition of a time-dependent channel impulse response 
hr- Consider a channel impulse response made of a large 
number L of echoes: 



/i,(t)-E"'(^)^(*-^'(^))- 



(12) 



1=1 



The number of echoes L, is usually far too large to warrant a 
finite rate of innovation approach. However individual echoes 
aggregate in a smaller and manageable number of clusters 
K 1431 . The rationales behind clustering are the same as for 
the common support assumption: a finite bandwidth combined 
with background noise allow only for a limited resolution. 
Table |T] lists a few examples for which clustering applies in 
typical operating conditions. 

This simplification is at the heart of medium and narrow- 
band wireless communications 1 44 1 . We want to estimate hr 
by sending probes at the input and collecting samples at the 
output. 

Correlation of the channel with respect to time is an 
important feature to exploit, however we will not consider 
it, as scheduling in modern communication systems makes its 
usage uncertain. Hence we settle on a time r and drop it from 
the notation. 

Communication is carried over a restricted frequency band, 
which is achieved by pulse-shaping with a template function 
Lp{t) and modulation by e^"'^*. Applying clustering to ( fT2l i the 
channel impulse response becomes: 



K 



hit) = J2 ^Mt - tk) 



k=l 



s.t. Cfc = akZk = e^'^"*' 



{ai,ti)eCk 



jbjc{tl-tk) 



(13) 



(14) 



where Z^ has unit-variance, at is the appropriate scal- 
ing parameter and Ck is the fc*'' cluster. Assuming 
{a;e-'"'^(*'^*'')}(Q,j jj-jgCj, contains i.i.d. elements with finite 
first two moments (echoes of finite energy) 



lim 



Zk^Mc{0,l). 



This is the classical non line of sight fading scenario 
where the paths amplitudes \Zk\ are independently Rayleigh 
distributed. 

2} Multipoint communications, one to many: Communica- 
tion through fading channels rely on spatial diversity to gain 
robustness. Spatial diversity is achieved with the deployment 
of several antennas at the receiver and/or transmitter We 
describe a spatial channel model between one transmitter and 
several receivers, which generalizes to MIMO communications 
in a straightforward manner. The physical properties of the 
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channel are the following as shown in Figure [3] 

• The distance in between antennas m and n is dm,n- 

• Each path is characterized by an angle of arrival (AoA) 
9k. To simplify computations it is assumed that the AoA 
is the same for all antennas (far field assumption). In 
the near field, a scatterer surrounds the receiver and 
the distribution becomes almost isotropic. Hence this 
assumption can be made for both regimes with limited 
error. 

• The direction normal to the segment between antennas m 
and n points toward azimuth dm.7i- 



k^^ scatterer 



TABLE III 
SIMULATION PARAMETERS 




Parameter 


Symbol 


Value 


Sampling step 


T 


50ns 


Bandwidth 


B 


20MHz 


Center frequency 


fc 


2.6GHz 


Frame duration (without padding) 


T 


25.55/is 


Samples per frame 


-^^ frame 


511 


Pilots per frame 


N 


63 


Pilot gap 


D 


8 


Delay spread 


A 


1.6/xs 



where Al/al w (1 



-3Kfc/4 



)Kk, J(.) is the Bessel function 



of the first kind and /(.) is the modified Bessel function of the 
first kind. 

Proof See appendix lAl It is only a close approximation 
since the azimuthal distribution at the receiver is approximated 
by a Von-Mises distribution. ■ 

Corollary 2. For a path width Kk large enough: 



Fig. 3. Channel model with a single scatterer. Each scatterer is characterized 
by its apparent width (width/distance) o-fe/Afe and its azimuth 9fe. This model 
is considered vahd in the near field as well, as the scatterer surrounds the 
receiver, thus having no intrinsic azimuth. 

The channel model in dTSl ) applies to the P subchannels 

K 



E [Zk,mZl,^ 



Jo ( dm,n] 

2^/2^^//..(0^^(^c^m,«) (17) 

K 



1=1 

cos U I — t 



''-2 



hm{t) = ^a,n,kZm,k(p{t ~ tni.k) , m = 1 , . . . , P. (15) such that f^^ is the centered Gaussian pdf of variance Kk- 



fc=i 
We assume the distance in between antennas is smaller than 
the achievable spatial resolution, hence 

tl,k — t2,k = ■ ■ ■ = tp^k- 

To fuUy characterize the channel, the path correlation across 
antennas must be known — by assumption Zrn,k and Zm'.k' 
are independent for k 7^ k'. Following Salz and Winters 1451 
we derive a formula for the autocorrelation matrix of Zk = 
[Zi.fc • • • Zp^k]- However we put a Gaussian prior on the cluster 
shape rather than a uniform one with discontinuities at the 
boundaries. 

3) Spatial correlation of paths: As in |45|, a large number 
of reflections are assumed to be drawn from a continuous 
probability distribution for each scatterer 

Proposition 6. Under the spatial channel model described in 
Section \V-A2\ the antenna crosscorrelation is closely approx- 
imated by: 



^[Zk,mZln\ — Jo { d„i^nj 



^ loink) V c 



1=1 



■ cos I 



(16) 



The result is in its form similar to |45|, however the 
derivation stays closer to the original physical model. 

VI. Numerical results 

For simulations we use the channel model developed in 
Section [Vl and choose its parameters to loosely follow the 
3GPP-LTE standard. Its characteristics are listed in Table |III] 
We assume 63 pilots which are uniformly spaced in frequency, 
one every 8. The transmitted frame is circularly padded such as 
to guarantee circular convolution of the transmitted signal with 
the CIR. Results are derived from three different experiments: 
A The medium has two paths separated by 2T. The second 
path's expected amplitude is 1/10*'' of the expected 
amplitude of the first path. The receiver possesses 1, 2, 4 
or 8 uncorrected antennas. The channels have exact SCS 
(e = 0). 
B The medium has two paths separated by T or 2T. Both 
paths have the same expected amplitude. The receiver has 
4 uncorrected antennas. The channels have either exact 
SCS (£ = 0) or non-exact SCS (e = T/50 = Ins). The 
discrepancy in the ToA between antennas is uniformly 
distributed in [— e e]. A time lapse of 2T/50 corresponds 
to a path length difference of 60 cm. 
This experiment is more realistic from a physical stand- 
point. The receiver has 5 antennas equispaced on a circle 
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(b) second path 
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Fig. 4. (Exp. A ) For the same global input SNR, a system with more antennas estimates the ToAs more accurately and is more resilient to noise. This is 
a consequence of the increased receiver diversity. The second path has I/IO"" the ampHtude of the first path and is thus quickly buried into noise as SNR 
decreases. The estimation reaches the Cramer-Rao bound as long as it correctly identifies the path. 
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Fig. 5. (Exp. A ) Part (a) shows the performances of Block ESPRIT-TLS with or without Block Cadzow denoising. In this setup, the gain obtained with 
the denoising is relatively small and is achieved after one iteration. Part (b) shows the performances of Block Prony-TLS with or without Block Cadzow 
denoising. As expected, the performance of Prony's algorithm without denoising is very poor. After 3 denoising iterations, performances of Block Prony-TLS 
and Block ESPRIT-TLS are indistinguishable. 



of radius 10 cm. The propagation medium contains 4 
scatterers (Figure ??.(a)). The expected CIR modulus 
is represented in Figure ??.(b). We use the spatial cor- 
relation model derived in Proposition |6] and provide 
the antennas cross-correlation in Figure ??.(c). Also the 
channel is not exactly SCS, with a maximum delay 
e ^ T/50 = Ins. 

Results were obtained on 400 independent noise and fading 
realisations. 

4) Results on Exp. A : Figure g] shows that the SCS- 
FRJ algorithm efficiently estimates the ToA down to a certain 
SNR where the recovery breaks down. This breaking point 
is pushed lower as spatial diversity increases, which is to be 
expected. Figure |5] compares the use and combination of the 
various subspace identification techniques discussed earlier. 
The conclusion is that the performances of Block-ESPRIT 
TLS or Block-Prony TLS are exactly the same on a signal 
denoised with the Block-Cadzow algorithm. However Block- 
ESPRIT TLS requires fewer to none Block-Cadzow iterations 



than Block-Prony TLS to reach the optimum. It is well-known 
that Prony TLS is not robust to noise |[lOl, ||29l . 

5) Results on Exp. B : Figure |6] shows that the single path 
CRB given in Proposition[3]is a good approximation of the true 
bound computed via Proposition|4]for multiple paths separated 
by more than twice the inverse bandwidth of the channel. This 
experiment also verifies the usefulness of the SCS assumption 
when ToAs are slightly perturbed from one antenna to another: 



tk.p — tk + E, 



k.p 



^fc,p-U([-ee]), i.i.d. 



The error caused by the random perturbation E^^p is of the 
order of the perturbation itself, and thus we may say SCS-FRI 
is robust on non exact SCS channels. 

6) Results on Exp. C : All estimation algorithms use the 
fact that the delay spread is much shorter than the frame 
length. The difference between lowpass interpolation and other 
techniques is the use of the sparsity property. Using this 
property alone, the SER is halved at a SNR of 5dB as shown 
in Figure |7] The addition of the SCS property proves to be 
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Fig. 6. (Exp. B ) This figure sliows that the proposed algorithms behave as expected in the presence of ToA mismatches between antennas. Part (b) motivates 
the separability assumption to compute the CRB of paths located more than 2T apart, while Part (a) shows its inadequacy for a smaller delay T. The "true" 
estimate is obtained via Monte-Carlo simulations. 
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Fig. 7. (Exp. C ) Using the SCS property, the SER is decreased by a factor 5 above lOdB of SNR compared to the conventional non-parametric approach. 
Spai'sity alone provides a significant SER improvement, which shall be combined with the common support property below 30dB of SNR. At very high SNR, 
independent channel estimation across antennas become preferable as the channels only approximately have the common support property. However, below 
15dB of SNR the effect of this approximation are undetectable. Another advantage of SCS-FRI is the reduction of pilots, it allows to halve their number 
while retaining performances superior to lowpass interpolation. 



valuable, at 5dB of SNR the SER is decreased by a factor 
3. At high SNR, the SCS property provides a factor 5 of 
improvement over lowpass interpolation. At very high SNR 
the error due to the approximate SCS nature of the channels 
diminishes this gain, and eventually the SCS assumption 
becomes detrimental. 

It also shows that the number of pilots can be halved 
while having SER performances superior to the non-parametric 
approach (we retained half of the original pilots closest to the 
carrier frequency). For lowpass interpolation, this cannot be 
done without introducing aliasing. Reducing the number of 
pilots below "Nyquist" is relevant at high SNR where little 
redundency is required for denoising, leaving some additional 
spectrum for data transmission. In favorable transmission 
conditions, it would be possible to reduce the number of pilots 
down to the rate of innovation of the channel to maximize the 
data throughput. 



VII. CONCLUSION 

We outlined the SCS-FRI algorithm, studied its perfor- 
mances on SCS channels estimation and computed theoretical 
lower-bounds for comparison. A spatial channel model was 
proposed for simulation purposes. The algorithm takes full 
advantage of the main properties of outdoor multipath channels 
and is directly applicable to most OFDM based communica- 
tion standards. Simulations indicate that SCS-FRI based on 
the Block-ESPRIT TLS routine seems to be the most suitable 
since it requires only two partial SVD with size of the model 
order and provides optimal accuracy. 

Future work is needed for estimation of the model order, 
incorporation of temporal correlation in the model and the 
algorithm (tracking of the model parameters). Computational 
complexity is also a crucial point for mobile applications, 
and improvements could be made with Krylov subspaces 



BARBOTIN et al.: ESTIMATION OF SPARSE MIMO CHANNELS WITH COMMON SUPPORT. 



11 



techniques as in 



Appendix A 
Spatial correlation formula for fading channels 

A. Azimuthal scatterers density distribution 

The reflection density of each scatterer is normally dis- 
tributed with mean /i.^, (its position) and covariance matrix 
crp (its "girth"). The number of reflections within a scatterer 
is assumed to be large enough to warrant their approximation 
by their continuous probability density function. The azimuthal 
density is the integral of the scatterer's pdf over F^ the straight 
path from the receiving antenna at an anglq^ "&'■ 

Pi^ifJ'k^'^k)^ f^Ji^\x - n^)I^er^dx. (18) 



Reparametrization in polar coordinates yield: 

P(^;Mfc:0-fc) =/o-?,(ll/^fcl|sin(z9)) 

L?ir- ||Mfc||cos(i?))J^(r,^)dr, 



-.-V 



+ 00 



kJ^ 003(19) 



/cj, sin(t9) 

<jj:'f{r-\\fi,\\cosm 



(s + w kJ. cos{'d))a%ds 



such that K'f. = ||/i,j.||^/cr^. and Jx{r, i3) ~ r is the Jacobian of 
the cartesian to polar transformation. We performed the change 
of variable s — r — ^/k^ cos{ii}) . Hence, the distribution has 
only one degree of freedom, and after some calculus; 

P<W=/(\/4sin^) (19) 

'kJ, cos?? • F{J k'^ cos'd)f{J k!^ cosi?) . 

The circular distribution ( fT9] l is well approximated by a Von- 
Mises distribution of scale Kk'. 

^Kf^ COS ^ 

(20) 



27rio(^fcj 



where In is the order modified Bessel function of the first 



kind. Asymptotically, n'f. 



Kk, and the approximation 

'*fc ~ (1 ~ e~^''*^/'*)Kfc was found to be empirically accurate 
for all Kk (K-L divergence between p^/ and q^k is less than 
0.02 bits). 



B. Derivation of the correlation matrix formula 
Considering the setup of Figure [3] and from P31 : 



Rf[m,n] 



X^ + e^^n~Ok)e^^''-"''^''d^. 



-Without loss of generality the scatterer origin is at azimuth 0, and the 
antenna is located at position 



Then, q^^^ is expanded in terms of spherical harmonics via the 
Jacobi-Anger expansion ||47l (9.1): 

1 




27r/o(Kfe) 

00 
^ Il{Kk) COs[l{-d + e„t,n 



1=1 
1 1 

2^ 



7I'/o(k/c) 



1 = 1 



where the second equality is obtained with Ii{jx) 
11471(9.6.3,9.1.35). 






We now have a series for W^ [m, n] with l^^ term: 



(a) 
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(b) 2Il{Kk) 

h{nk) V c 

(c) 2fll{Kk) ^ (UJ, 



Io{Kk) 



l {j—dm,nj cos I [Ok 
V C 



Equality (a) is obtained with some standard trigonometric 
identities and a shift by —^ of the variable of integration. 
Equality (6) follows from the standard integral representation 
of Ii ( 1471 9.6.19). The second integrand is antisymmetric 
which leads the integral over the unit-circle to vanish. Finally 
(c) is a consequence of Ii{jx) — fJi{x) again. Hence; 



R 



(k) 



[m,n] =Jo (^— dm,„j 



loii^k) 



Y^fMnk)Ji{^c 
1=1 ^ 

(Ok - 
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